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Abstract 

Background: Febrile neutropenia is a frequently occurring and occasionally life-threatening complication of 
treatment for childhood cancer. Many biomarkers have been proposed as predictors of adverse events. We aimed 
to undertake a systematic review and meta-analysis to summarize evidence on the discriminatory ability of initial 
serum biomarkers of febrile neutropenic episodes in children and young people. 

Methods: This review was conducted in accordance with the Center for Reviews and Dissemination Methods, 
using three random effects models to undertake meta-analysis. It was registered with the HTA Registry of 
systematic reviews, CRD320091 00485. 

Results: We found that 25 studies exploring 14 different biomarkers were assessed in 3,585 episodes of febrile 
neutropenia. C-reactive protein (CRP), pro-calcitonin (PCT), and interleukin-6 (IL6) were subject to quantitative meta- 
analysis, and revealed huge inconsistencies and heterogeneity in the studies included in this review. Only CRP has 
been evaluated in assessing its value over the predictive value of simple clinical decision rules. 

Conclusions: The limited data available describing the predictive value of biomarkers in the setting of pediatric 
febrile neutropenia mean firm conclusions cannot yet be reached, although the use of IL6, IL8 and procalcitonin 
warrant further study. 



Background 

With multi-modality therapies, children with malignancy 
have an excellent chance of survival, with overall rates 
approaching 75% [1]. Deaths are largely due to their dis- 
ease, but around 16% of deaths are from complications 
of therapy [2,3] . This proportion depends on the under- 
lying malignancy, and the risk of death from infection 
remains high in some groups, for example, acute mye- 
loid leukemia [4]. Robust risk stratification, which reli- 
ably predicted those children at high risk of 
complications, could target more aggressive manage- 
ment, where children at very low risk of having a signifi- 
cant infection could be treated with reduced intensity 
and/or duration of hospitalized antibiotic therapy [5]. 
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There are a wide range of differing approaches to this 
risk stratification, largely built on simple clinical data 
[6-8], demonstrating only moderate discriminatory 
ability. 

The ability of specific serum biomarkers to predict 
adverse consequences in patients with febrile neutrope- 
nia has been explored, for example, C-reactive protein 
(CRP), pro-calcitonin (PCT), interleukin-6 (IL6) or inter- 
leukin-8 (IL8) [9-12]. These studies have been small in 
the numbers of patients and episodes and the research- 
ers could not reach definitive conclusions. Drawing 
these reports together and synthesizing their results 
should improve our understanding of their clinical 
usefulness. 

Although systematic reviews have been conducted 
previously in adults [13] and non- immunocompromised 
children [14,15], their results are difficult to compare. 
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There are data to suggest children and adults with neu- 
tropenic fever vary in the nature of the infections which 
afflict them [16], implying any review needs to take into 
account the specific population under study. 

This review aimed to identify, critically appraise and 
synthesize information on the use of biomarkers at 
initial evaluation for the prediction of the outcome of 
febrile neutropenic episodes in children/young adults 
and to highlight important problems in the current 
methods used in such analyses. 

Methods 

The review was conducted in accordance with "Systema- 
tic reviews: CRD's guidance for undertaking reviews in 
health care" [17] and registered on the HTA Registry of 
Systematic Reviews: CRD32009100485. It sought studies 
which evaluated the diagnostic ability of serum biomar- 
kers of inflammation/infection in children or young peo- 
ple aged 0 to 18 years of age, taken at the onset (within 
12 hours) of an episode of febrile neutropenia. Both pro- 
spective and retrospective cohorts were included, but 
those using a case-control approach were excluded as 
these have been previously shown to exaggerate diag- 
nostic accuracy estimates [18]. 

Search strategy and selection criteria 

An electronic search strategy (See Additional file 1) was 
developed to examine a range of databases from incep- 
tion to February 2009, including MEDLINE, EMBASE, 
CINAHL, Cochrane Database of Systematic Reviews, 
Database of Abstracts of Reviews of Effects, Health 
Technology Assessment Database, Cochrane Central 
Register of Controlled Trials, Conference Proceedings 
Citation Index - Science and LILACS. 

Reference lists of relevant systematic reviews and 
included articles were reviewed for further relevant arti- 
cles. Published and unpublished studies were sought 
without language restrictions. Non-English language stu- 
dies were translated. Two reviewers independently 
screened the titles and abstracts of studies for inclusion, 
and then the full text of retrieved articles. Disagree- 
ments were resolved by consensus. 

The validity of each study was assessed using 1 1 of the 
14 questions from the Quality Assessment of Diagnostic 
Accuracy Studies (QUADAS) assessment tool for diag- 
nostic accuracy studies [19] (see footnote of Additional 
file 2). The QUADAS tool was adapted specifically for 
the review, as suggested by current guidance [20], omit- 
ting questions on "time between index and reference 
test", "intermediate results" and "explanation of withdra- 
wals". The index test (biomarkers) and reference test 
were always examined within a single episode of febrile 
neutropenia, making this question indiscriminating. 
Tests of biomarkers are not reported as 'positive' and 



'negative', and so "intermediate" results are not found in 
these types of studies. Rather than addressing "incom- 
plete data" as a validity item, it was addressed in the 
data analysis. 

Data were extracted by one researcher using a standar- 
dized data extraction form and accuracy confirmed inde- 
pendently by a second; except with foreign language 
papers where a translator working with a reviewer under- 
took the extraction. Clinical data extracted included par- 
ticipant demographics, geographical location, participant 
inclusion/ exclusion criteria and antibiotics used. Metho- 
dological information included methods used to adjust 
the predictive estimate, including the variables consid- 
ered, and methods of analysis. The reference standard 
outcomes considered relevant included survival, need for 
intensive/high-dependency care, single organ impair- 
ment, invasive bacterial or fungal infection, presence of 
documented infection, including radiologically confirmed 
pneumonia, and duration of hospitalization. The sensitiv- 
ity and specificity of the biomarkers were extracted, pre- 
ferentially as 2 x 2 tables comparing dichotomized test 
results against the reference standard. Where data were 
only presented as mean and standard deviation, conver- 
sion was undertaken using the assumption of Normality 
and deriving a 2 x 2 table for cut-offs reported by other 
studies (Anzures, Cochrane Colloquium Freiburg 2008). 

Methods of analysis/synthesis 

Quantitative synthesis was undertaken for studies which 
tested the same diagnostic test for similar clinical out- 
comes and, where appropriate, was investigated for 
sources of heterogeneity. 

Three approaches were used for meta-analysis. The 
first approach (Method 1) pooled data from the most 
commonly reported threshold, using a single data point 
from each study that provided relevant information, for 
example, each study reporting serum CRP > 50 mg/dL. 
This was expressed as the average test sensitivity and 
specificity, with a 95% confidence interval. This was cal- 
culated by fitting the standard bivariate random effects 
model using STATA (version 10) [21] with metandi [22] 
and midas [23] for analyses of four or more studies; for 
those with fewer than four studies a random effects lin- 
ear regression was directly fitted using xmelogit. The 
bivariate model is the most commonly used technique 
in diagnostic meta-analysis, and has benefits of being 
easily interpretable, as it provides a point estimate of the 
test accuracy in this context for a defined cut-off value, 
and is technically straightforward to undertake. Its 
weaknesses lie in the partial use of data from all the 
included studies, (since accuracy at multiple test cut-offs 
was available from many studies), which may lead to 
reduced power and consequent imprecision, and 
increased risk of bias from a selective use of data. 
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The second approach (Method 2) again pooled one 
data point from each study, but combined information 
from multiple thresholds, for example, serum CRP > 40 
mg/dL, > 50 mg/dL and > 90 mg/dL, and the output 
was expressed as a hierarchical summary receiver opera- 
tor curve (HSROC). The HSROC describes the relation- 
ship between sensitivity and specificity derived from the 
individual receiver operator curves (ROC) of each study. 
In this way, it describes the 'average' relationship 
between a continuous cut-off value and discriminatory 
ability in the 'average' population. This increases the 
information used in the meta-analysis and better repre- 
sents the data. The same routines were used in STATA 
(version 10) [21] to produce these estimates. This 
approach is again technically straightforward to perform, 
and the output allows clinicians to estimate how chan- 
ging thresholds will alter the diagnostic utility of the 
test under study. Its weaknesses relate to the difficulty 
in interpreting exactly what performance is associated 
with each cut-off level, and its lack of explicit inclusion 
of threshold data when producing the curve. 

The third analysis (Method 3) allowed multiple data 
points from multiple thresholds from each study to be 
included, and was undertaken using a multinomial ran- 
dom effects method deriving proportions of the popula- 
tion with/without the outcome at each cut-off level of 
the biomarkers. These were then used to derive likeli- 
hood ratios for each level [24]. This provides the richest 
model, including all of the available data from the stu- 
dies and should produce the clearest possible descrip- 
tions of the predictive value of the biomarkers. This was 
accomplished using a previously published method [8] 
and non-informative priors. Analyses were undertaken 
using WinBUGS 1.4.3 [25]. The code is available upon 
request. This method is theoretically superior to the 
other methods, as it includes all of the available data, 
unlike Method 1, explicitly uses the threshold values, 
unlike Method 2, and produces threshold-specific esti- 
mates of diagnostic test performance, which can be 
interpreted directly by clinicians. It is the most techni- 
cally challenging of all the methods used, requiring spe- 
cific code to be written for each analysis, rather than the 
use of easily available software packages. 

Heterogeneity between study results was explored 
through consideration of study populations, design, pre- 
dictor variables and outcomes. Meta-regression was not 
undertaken due to the small number of studies. When 
quantitative synthesis was not possible, a narrative 
approach was used to synthesize the information. 

Results 

Three hundred, sixty-eight articles were initially 
reviewed, and 72 retrieved for more detailed examina- 
tion. Twenty-five articles provided quantitative outcome 



data in the form required for the review (see Additional 
file 3). The included studies included 2,089 patients and 
over 3,585 episodes, assessing 14 different markers of 
inflammation or infection (see Table 1). The study out- 
comes were grouped into: bacteremia, invasive fungal 
infection, significant/documented bacterial infection, 
sepsis and death. The population in the studies varied, 
with most being a mixture of hematological and solid 
malignancies, and very little data from stem cell trans- 
plant recipients (see Table 2 for further detail). Thirteen 
of these contributed to 1 or more meta-analyses while 
the remaining 12 studies did not provide data which 
could be included in any meta-analysis, (see Figure 1). 
Three biomarkers and 2 outcomes could be included in 
the meta-analysis: 11 studies provided data on CRP 
[9,26-35] and documented infection. Four studies pro- 
vided data on PCT [28,29,31,33] and documented infec- 
tion. Four provided data on IL6 [31,36-38] and 
documented infection or gram negative bacteremia. 

Quality assessment 

The studies varied in quality; see Additional file 2. The 
major deficiencies in most studies were in a failure to 
report if the marker test and outcomes were interpreted 
blind to each other. One study [26] assessing CRP 
demonstrated a potential contamination of the reference 
standard with the diagnostic test: the outcome included 
CRP > 150 mg/dl. One short report did not detail the 
exact outcome used [39]. Twenty different definitions of 
'febrile neutropenia' were described, including six defini- 
tions of neutropenia ranging from < 200 cells/mm 3 to < 
1,000 cells/mm ; four definitions of peak fever, from > 
37.5°C to > 39°C; and six of sustained temperature, from 
> 38°C to > 38.5°C over varying durations. There were a 
total of 14 combinations to define 'febrile'. 



Table 1 Summary of biomarkers reported across all 
included studies 


Biomarker 


Total studies 


C reactive protein 


20 


Interleukin 6 


10 


Interleukin 8 


10 


Procalcitonin 


8 


Tumor necrosis factor a 


2 


Interleukin 10 


1 


Monocyte chemoattractant protein-1 


1 


Erythrocyte sedimentation rate 


1 


Adenosine deaminase 


1 


Serum Amyloid A Protein 


1 


Interleukin 1 


1 


Interleukin 5 


1 


Interleukin 12 


1 



Interleukin 2 - receptor 1 
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Table 2 Details of biomarkers, patients and endpoints in 25 included studies 

Citation Underlying conditions Underlying Markers Number Number Endpoints Comments 

conditions studied of of studied on endpoints 

patients episodes 

Ammann Pre-B-cell ALL = 94, Other CRP 111 285 Significant Defined as death from bacterial 

2003 diagnosis = 191 bacterial infection infection, a positive culture of 

normally sterile body fluids, 
radiologically proven 
pneumonia, clinically 
unequivocal diagnosis of a 
bacterial infection, or a serum 
CRP above 150 mg/L as an 
indirect sign suggesting severe 
bacterial infection 



Barnes 2002 


No data given 


PCT 


37 


39 


Length of stay 


Stay of < 5d or >5d 


de Bont 1 999 


ALL = 8, AML = 20, CML = 2, 
Lymphoma = 16, Solid tumor 
= 7 


CRP, IL6, 
IL8 


19 


72 


Bacteremia 




Diepold 2008 


ALL = 21, AML = 1, JMML = 1, 
Relapsed AML after SCT = 1, 
Solid tumor = 39, 
Hematological disorder after 
SCT = 4, Hematological 
disorder without SCT = 1 


CRP, IL6, 
IL8 


69 


123 


Bacteremia 
Fever lasting >5d 
but culture -ve 




Dylewska 
zuuj a aiiu u 


No data given 


CRP, PCT 


66 


108 


Bacteremia 
Clinically defined 
infections (UTI, 
neurological, Gl 
or respiratory) 
Microbiologically 
defined other 
infection 


FUO was the default category 


El-Maghraby 
2007 


ALL = 37, AML = 39, 
Lymphoma = 39 


CRP, IL8, 
MCP 


76 


85 


Bacteremia or 
clinically 
documented 
infection 




Hatzistilianou 
2007 


All patients had ALL 


CRP, PCT 


29 


94 


Microbiological or 
clinically 
documented 
infection 
(excludes viral) 




Heney 1992 


ALL = 17, AML = 10, 
Lymphoma = 2, Solid tumor = 
18 


CRP, IL6 


33 


47 


Bacteremia 




Hitoglou- 
Hatzi 2005 


All patients had ALL 


CRP, PCT, 
tADA 


67 


Not 
stated 


Significant 
bacterial infection 




Hodge 2006 


No overall data given 


IL8, IL5 


31 


31 


Positive blood 
culture 




Katz 1 992 


Haematological malignancy = 
82, Solid tumor = 40 


CRP 


74 


122 


Clinically or 
bacteriologically 
documented 
infection 
Septicemia (+ve 
blood cultures 
and unwell 
clinical 
appearance) 




Kitanovski 
2006 


Hematological malignancy = 
50, Solid tumor = 18 


CRP, PCT, 
IL6 


32 


68 


Bacteremia and 
clinical sepsis 





Clinically or 
microbiologically 
documented local 
infection 
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Table 2 Details of biomarkers, patients and endpoints in 25 included studies (Continued) 
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CRP, PCT, 


56 


121 


(~\ i n irs I \\t 
\~ 1 1 1 i i^_u 1 1 y 


Fl IO \A/a^ thp default ratpnnrv 


1999 


= 5, Solid tumor = 27 


IL6 ' 






documented 














infection 














Fungal infection 














Bacteremia (gram- 














type) 




Lehrnbecher 


ALL = 48, AML = 15, 


IL6, IL8 


146 


311 


Significant 


Defined as bacteremia, 


2004 


Lymphoma = 16, Histiocytosis 








bacterial infection 


localised infection or 




= 1, Solid tumor = 66 










pneumonia 


D i i n o n 
nl 1 l\U[ Ifcr 1 1 


I \U Ud Ld Lj I Vcl I 


IL6 IL1 


4 6 


1 OS 


DdLlfcrl fcrl 1 1 Id 


"Mn infcii — H/"trV' \A/ac tno Hpra i t 
INU IMIcLLIUII VVdS Lilt: UcldUIL 


1992 




TNF, SAA 






suspected sepsis 


category 












Fr~i/~a I n TP r~1~ i i~i n 

I ULo I 1 1 1 1 Cl^LIUI I 




Riikonen 


ALL = 20, AML = 24, Solid 


CRP 


46 


91 


Bacteremia 


"No infection" was the default 


1993 


tumor = 47 








suspected sepsis 


category 












Focal infection 




Santolaya 


Leukemia = 47, Lymphoma = 


CRP 


75 


85 


Documented 


Documented bacterial infection 


1994 


1 7, Solid tumor = 1 1 








bacterial infection 


defined as bacteremia (two 












Probable bacterial 


sets positive for commensals) 












infection 


or sterile site infection; 












Viral infection 


Probable bacterial infection 














defined as cultures negative 














but severe medical course, for 
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ALL = 40%, AML = 8%, 


CRP 


257 


447 


Invasive bacterial 


Defined as positive blood 


2001 


Relapsed leukemia = 14%, 








infection 


cultures - 2 for CoNS, positive 




Lymphoma = 6%, Sarcoma = 










bacterial culture from usually 




1 7%, Other solid tumor = 1 5% 










sterile site, or sepsis syndrome 
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2002 


Lymphoma — 10, Solid tumor 








infection 


cultures - 2 for CoNS, positive 




— 54 










haj~tpria c\ i ti irp fn~\m i ki la \/ 
udLAtriidi luiluic iiuiii usually 














sterile site, or sepsis syndrome 














and/or focal organ involvement 














and haemodynamic instability 














and severe malaise 


C,a nfol ^a/p. 
jgi i lu i □ y q 


Kin n\/prsll rl^ts nivpn 

1 \ U LyVCTIull UaLQ U 1 V tZT 1 1 


CRP 


219 


373 


Death 




2007 














Santolaya 


No overall data given on 


CRP, PCT, 


278 


566 


Severe sepsis 


Defined as sepsis + respiratory 


2008 


u lay i luses. 


IL8 








Ul LdlUldL LUI lipiUl l llbtr, Ul t Z 














UU I fcrl Ul yd I I LUlIipiUIIII biz/ I lfcJ L 














dppdlfcrML UUllliy LI It: lllbl ZJ-\ II 














of admission 
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CRP PCR 


49 


60 


DdLltrl fcrl 1 1 Id 




2007 


= 14, Sarcoma = 7, 


ESR 






Documented 






Histiocytosis = 1, Other solid 








bacterial infection 






tumor = 18 








(microbiologically 














or clinically) 














Duration of fever 




Soker 2001 


ALL = 17, AML = 4, Lymphoma 


IL6, IL8, 


23 


48 


Bacteremia 






= 2 


IL2R, IL1, 














TNF- 














alpha 










Spasova 2005 


ALL = 23, AML = 1, Lymphoma 


CRP, IL6, 


24 


41 


Bacteremia 






= 6, Solid tumor = 1 1 


IL8, IL10 






Microbiologically 





or clinically 
proven local 
infections without 
bacteremia 
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Table 2 Details of biomarkers, patients and endpoints in 25 included studies (Continued) 



Stryjewski 
2005 



ALL = 35, AML = 2, Sarcoma 
8, Other solid tumor = 1 1 



PCT, IL6, 56 Not Sepsis Sepsis (positive culture - two 

IL8 stated Septic shock consecutive +ve if CoNS, fever, 

tachycardia, or tachypnoea); 
septic shock defined as sepsis 
plus need for inotropes/ 
vasopressors 



CRP, C-reactive protein; CoNS, coagulase negative staphylococcus; ESR, erythrocyte sedimentation rate; FUO, fever of unknown origin; IL, interleukin; MCP, 
monocyte chemoattractant protein; PCT, pro-calcitonin; SAA, serum amyloid protein A; tADA, t- Adenosine Deaminase; TNF, Tumor necrosis factor 



Potentially relevant articles identified from 
databases 
n = 368 



Excluded articles; n = 296 
Not cancer = 20 
Not FNP= 105 
Not children = 19 
Not marker =117 
No appropriate outcomes = 2 
Not testing marker = 31 
Two gate design = 2 



Potentially relevant articles assessed in detail 
n = 72 



Identified from 
reference lists 
n = 7 



Excluded articles; n = 52 
Not FNP = 9 
Not children = 7 
Children not extractable =18 
Not marker = 2 
No appropriate outcomes = 3 
Data not extractable = 1 
Not testing marker =10 
Duplicate publication = 1 
Original article not available = 



Data extraction undertaken 
n = 27 



'Duplicate' publication, n= 2 

(One erratum for previous article & 
one study published over 2 articles) 



Studies with quantitative data available 
n = 25 



Included in quantitative synthesis 
n = 13 



Figure 1 Flow diagram of study selection process 
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Data handling and analysis 

Detailed analysis of the statistical modeling used in the 
original studies revealed potential problems in adjust- 
ment of estimates for other factors, limited event-per- 
variable ratios, poorly described handling of multiple 
episodes and missing data, and use of data-driven 
dichotomies in the reporting of test accuracy. 

Diagnostic test performance 

Data were available for meta-analysis for CRP for micro- 
biologically or clinically documented infection; for PCT 
assessing microbiologically or clinically documented 
infection; and IL6 reporting microbiologically or clini- 
cally documented infection, and gram negative bactere- 
mia. Individual results for these studies and outcomes 
are given in Additional file 3. 

Meta-analysis using the three specified approaches 
illustrated how the standard, simple approach to pooling 
of test accuracy data may be misleading and lead to 
clinically inappropriate conclusions. 

For studies with similar outcomes and where identical 
cut-off values were reported in more than one study, 
meta-analysis was undertaken to calculate a single diag- 
nostic test accuracy estimate using the standard random 
effects bivariate approach: Method 1 (see Table 3 and 
Figure 2). This approach is the commonly applied tech- 
nique, yet does not take into account the inconsistency 
of the full data set (see Methods). 

There is marked heterogeneity in the results of this 
meta-analysis, with sensitivity heterogeneous in all mar- 
kers, and specificity most heterogeneous in PCT and 
CRP. This can be appreciated by comparison of the 
point estimates and confidence intervals in the y (sensi- 
tivity) axis and x (reverse-specificity) axis in Figure 2. 

Using the second approach, producing HSROC, it was 
possible for CRP and PCT to detect 'documented infec- 
tion': Method 2. No further HSROC curves were derived 
as no other combinations of outcome and biomarker 
were available in more than three studies. In this analy- 
sis, the threshold variation was not adhered to, as can 
be seen in the example of CRP. Figure 3a shows the 
curve without threshold, and 3b shows how the values 
are not in the order expected. The expectation is that a 



higher cut-off produces a lower sensitivity and higher 
specificity; this is not the case and so this makes clinical 
interpretation of the curve impossible. 

The meta-analysis method (Method 3), which maxi- 
mizes the use data, including multiple thresholds from 
studies using a multinomial random effects model, 
demonstrates that these problems arise because of the 
inconsistencies in the reported data. Again, the CRP 
data are used to demonstrate this (see Figure 4). This 
shows that some of the lower thresholds are less sensi- 
tive than higher thresholds; for example, using a cut-off 
of > 20 mg/dL produced more false negative results 
than a cut-off of > 50 mg/dL. These differences are 
beyond those expected by chance and led to the ana- 
lyses producing clinically meaningless results. This is 
likely to be due to the extreme heterogeneity and sparse 
data. 

Data on the diagnostic value of nine other markers are 
presented in Table 4. IL8 was most frequently described 
[27,38,39]. Most of these studies were exploratory, pro- 
posing new biomarkers and deriving cut-offs, for exam- 
ple, Monocyte chemoattractant protein- 1 or Adenosine 
deaminase. The predictive value of these biomarkers is 
also heterogeneous, and subject to potential biases. 

Discussion 

This systematic review of the predictive value of serum 
markers of inflammation and infection in children pre- 
senting with febrile neutropenia found 25 studies report- 
ing 14 different markers. Of these, CRP, PCT, IL6 and 
IL8 were most commonly examined. The finding of a 
diverse range of potentially useful markers, but such lit- 
tle consistency across studies, is unfortunately common 
in such research [40], and may reflect the relative lack 
of coordination in supportive care studies. 

The studies presented similar challenges in reporting, 
methodology and analysis. Reporting if the test was 
interpreted 'blind' to the results of the outcome analysis, 
and vice-versa, was very poorly reported. Many studies 
failed to assess if the marker had supplementary value 
above the simple admission data collected by clinicians 
at every encounter: age, malignancy, temperature, vital 
statistics and blood count. Analysis of the data was 



Table 3 Bivariate estimates of diagnostic precision of various biomarkers and outcomes 

Marker Outcome Cut-off Sensitivity (95% CI) Specificity (95% CI) 

CRP Documented infection > 50 mg/dl 0.65 (0.41 to 0.84) 0.73 (0.63 to 0.82) 

(7 studies, 731 episodes) 

PCT Documented infection > 0.2 mg/ml 0.96 (0.05 to 0.99) 0.85 (0.53 to 0.97) 

(3 studies, 216 episodes) 

IL6 Documented infection > 235 pg/ml 0.68 (0.15 to 0.96) 0.94 (0.84 to 0.98) 

(3 studies, 457 episodes) 

IL6 Gram negative bacteremia > 1,000 pg/ml 0.78 (0.57 to 0.91) 0.96 (0.92 to 0.99) 

(2 studies, 166 episodes) 
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Figure 2 Method 1: bivariate pooled estimates of sensitivity and specificity for CRP, PCT and IL6 The plots indicate individual study 
estimates of sensitivity and specificity with 95% confidence intervals demonstrated by dashed lines, the solid lines indicate the meta-analysis 
result. 



frequently undertaken by episode, with no account of 
multiple admissions for the same patient. Such an analy- 
sis ignores the variation which may be expected from 
genetic polymorphisms for the production of the bio- 
marker under investigation [39], or in individual genetic 



susceptibility to infection [41,42]. The biomarker cut-off 
values reported were frequently derived from the dataset 
to which they were then applied, which is likely to pro- 
duce significant overestimations of accuracy [43]. The 
data were sometimes presented as mean and standard 




1 .8 .6 .4 .2 0 1 -8 .6 .4 .2 0 

Specificity Specificity 

a) Circles weighted according to study precision b) Marker points showing threshold (mg/dl) 

Figure 3 Method 2: hierarchical summary receiver operator curve plots of CRP for the diagnosis of documented infection, a) Circles 
weighted according to study precision b) Marker points showing threshold (mg/dl). 
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Figure 4 Method 3: ROC space plot of CRP for documented infection (all thresholds) 



deviation estimates, from which measures of test accu- 
racy were derived. Although this may raise concerns 
because of the assumption of a normal distribution, 
there is some empiric justification for this procedure 
[44]. 

Quantitative meta-analysis using three approaches 
demonstrated how the commonly used, simple techni- 
ques may fail to reflect inconsistencies in the whole data 
set and so produce misleadingly precise results. The 
example of this review is important to recall when 
appraising other reviews where inconsistencies may not 
have been as extensively investigated. 

The analysis undertaken using only the most com- 
monly reported cut-off in a restricted number of studies 
produced excessively precise results which did not 
reflect the uncertainty of the whole data set, and so 
should be rejected. A similar problem was found with 
the use of data points with different thresholds to pro- 
duce a hierarchical summary receiver operator curve 
(HSROC). The HSROC modelled by these techniques 
does not take into account the actual value of the 
thresholds. This is frequently reasonable: it is impossible 
to quantify the thresholds used by different radiologists 
to call a radiograph 'positive' for pneumonia. In cases 
where the values are known, an ordered relationship 
should be possible to determine, flowing from high to 
low cut-offs from left to right on the curve. This 
ordered relationship did not hold true for analyses of 



CRP and PCT and so should call into question analyses 
in other studies which do not assess whether thresholds 
vary according to the implicit structure of the model. 

A previously developed [8] technique to undertake the 
ordered pooling of all the results was used to attempt to 
overcome these difficulties of only selective use of the 
data, and of incorrect relationships between test thresh- 
olds. This approach failed to produce meaningful results 
for the ability of PCT and CRP to identify patients who 
developed a documented infection, reflecting the incon- 
sistencies and great heterogeneity of the data. 

Some of the observed heterogeneity may be due to 
differences in measurement between apparently similar 
outcomes. While bacteremia is likely to be similarly 
reported across the studies, the diagnosis of a soft-tissue 
infection may vary between clinicians and centers. Very 
few studies reported in detail the exact definitions of the 
outcomes they reported. Further variation may have 
been introduced by the varying definitions of fever and 
neutropenia. In this review, 20 different combinations of 
criteria were used to define febrile neutropenia. These 
data could not be directly assessed to explore their rela- 
tionship with the diagnostic value of the biomarkers, but 
as the depth of neutropenia and peak, and duration of 
temperature may affect the generation of biomarkers, 
the variation may further account for some of the het- 
erogeneity. Additionally, although the assay techniques 
used in the studies were reported to be similar, there 



Phillips et al. BMC Medicine 2012, 10:6 
http://www.biomedcentral.eom/1 741 -701 5/10/6 



Page 1 0 of 1 3 



Table 4 Estimates of diagnostic precision of various markers and outcomes in single studies. 



Citation 


Marker and 


Outcome 


Sensitivity 


Specificity 




Cutpoint 




(95% CI) 


95% CI) 


Santolaya 2008 


IL8 


Sepsis 


0.49 


0.71 




200 




(04 to 0.58) 


(0.67 to 0.75) 


Diepold 2008 


IL8 


Prolonged illness 


0.87 


0.61 




30 




(0.78 to 0.93) 


(0.42 to 0.76) 


Diepold 2008 


IL8 


Bacterial infection 


0.64 


0.62 




90 




(0.39 to 0.84) 


(0.52 to 0.71) 


El-Maghraby 


IL8 


Documented 


0.71 


0.77 


2007 


62 


infection 


(0.59 to 0.81) 


(0.58 to 0.89) 


Lehrnbecher 


IL8 


Documented 


0.56 


0.79 


2004 


320 


infection 


(0.46 to 0.65) 


(0.73 to 0.84) 


Lehrnbecher 


IL8 


Documented 


0.44 


0.89 


2004 


500 


infection 


(0.35 to 0.54) 


(0.84 to 0.93) 


El-Maghraby 


MCP 


Documented 


0.64 


0.92 


2007 


350 


infection 


(0.52 to 0.75) 


(0.76 to 0.98) 


Hitoglou-Hatzi 


tADA 


Significant bacterial 


1.0 


1.0 


2005 


35 U/l 


infection 


(0.88 to 1 .0) 


(0.91 to 1.0) 


Riikonen 1992 


TNF 


Bacteremia or focal 


1.0 


0.07 




40 


infection 


(0.88 to 1 .0) 


(0.03 to 0.15) 


Hodge 2006 


IL5 


Positive blood culture 


0.5 


Could not 




17 




(0.22 to 0.79) 


calculate 


Hodge 2006 


IL5 and 8 


Positive blood culture 


1.0 


0.87 




combined 




(0.68 to 1 .0) 


(0.68 to 0.96) 




> 17 and > 220 








Soker 2001 


IL-2R 


Bacteremia 


Median (range) 


1,190 (724 to 








5,230 U/mL (1,120 to 7,600) 


5,400) 


Soker 2001 


TNF-alpha 


Bacteremia 


8.4 


7.8 








(4.0 to 68.2) 


(3.0 to 37.2) 


Secmeer 2007 


ESR 


Bacteremia 


"not statistically significantly different between patients with and without 










documented infection" 





was no calibration of assays across the various studies. 
Other differences in the populations studied, such as the 
nature of the malignancies, recent surgical interventions 
and duration of therapy, may also add heterogeneity to 
interpreting markers which are themselves affected by a 
malignant disease. A more prosaic reason for heteroge- 
neity may be publication bias: the tendency for reports 
demonstrating good predictive value to be published 
than those showing poor discrimination [45-47]. 

In order to interpret the information from this 
review in a clinically meaningful way, both the esti- 
mates of predictive effectiveness and the uncertainty 
that surrounds these estimates need to be taken into 
account. CRP has been most extensively studied in this 
setting; it is a ubiquitous test and the only one which 
has been shown to add to the predictive ability of 
clinically-based decision rules [26,34]. These studies 
chose two differing cut-offs (> 50 mg/dl [26] or > 90 
mg/dl [34]). It is at best only moderately discrimina- 
tory in the setting of detecting documented infection 
(Sensitivity 0.65; 95% CI 0.41 to 0.84, Specificity 0.73; 
95% CI 0.63 to 0.82), which is in keeping with 



estimates drawn from its value in the detection of ser- 
ious bacterial infection in non-neutropenic children 
[48], and may be a significant overestimation of its 
value. The clinical role of CRP as a screening tool may 
be limited, however, if another biomarker is shown to 
be a more discriminatory test. 

Data from this review and meta-analytic comparisons 
of CRP and PCT in the non-neutropenic population 
[49] are suggestive of the improved predictive value of 
PCT over CRP. This has a strong pathophysiological 
basis, as PCT levels are reported to rise within 3 to 4 
hours in response to infection as compared with the 24 
to 48 hours required for CRP [33]. However, the data 
for the improved predictive value of PCT are quite var- 
ied (see Additional file 3 and previously published 
reviews [13]). This may be related to the degree of neu- 
tropenia, as reports from the post-transplant setting 
have shown disappointingly poor discrimination [50], or 
this again may be due to small studies and publication 
bias [47,51]. Based on the data from this review, procal- 
citonin cannot yet be recommended for use in routine 
clinical practice 
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Similar pathophysiological claims for improved predic- 
tive ability can be advanced for IL6 and IL8 [52]. In this 
review, IL6 level shows potential to be a better discrimi- 
nator than CRP of those children who will develop a 
serious infectious complication. IL8 also appears to have 
moderate discriminatory ability and has been used in 
combination with clinical data in a small pilot study to 
withhold antibiotics to a highly select group of patients 
with febrile neutropenia [53]. Both of these cytokines 
show promise, and should be subject to further 
investigation. 

Given the very limited data available for other poten- 
tial biomarkers of infection in the setting of pediatric 
febrile neutropenia identified by this review, no strong 
clinical conclusions for their use can be reached without 
further studies. 

These conclusions are drawn from an extensive and 
detailed systematic review of the available evidence 
using advanced techniques of meta-analysis, supplemen- 
ted by rational clinical and pathophysiological reasoning. 
It should be clearly understood that they are uncertain 
and unstable, as only small amounts of new data may 
substantially alter these findings. 

Conclusions 

This review demonstrates flaws in our current under- 
standing of the value of biomarkers in the prediction of 
adverse outcomes from episodes of febrile neutropenia, 
but also provides us with clear opportunities for devel- 
opment. All further investigation should estimate the 
additional value of biomarker measurements, beyond 
the discrimination already achieved by clinical variables. 
This should take into account key features of the treat- 
ment, for example, stem-cell transplantation and any 
clinically defined risk stratification already undertaken. 

This includes the use of individual patient data (IPD) 
meta-analysis, which should allow the effective added- 
value of markers to be measured when the best clinical 
data have been taken into account in differing sub- 
groups. Such a venture is in progress [54]. The biomar- 
kers IL6, IL8 and PCT appear promising, and should 
certainly be subject to new primary studies investigating 
more thoroughly the prediction of significant infectious 
morbidity, which includes both clearly defined infections 
and the sepsis syndrome, across a variety of clinical set- 
tings. By developing harmonized definitions of outcomes 
for such studies, greater confidence could be placed 
upon their results. The new SIOP Supportive Care 
group is ideally placed to lead on such a venture, and 
allow pediatric oncology/hematology to once more push 
the boundaries of international, collaborative clinical 
research. 
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